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Abstract 

A new algorithm, called the Minimal Residual 
QR algorithm, is presented to solve subset regres- 
sion problems. It is shown that this new scheme 
can be used as a numerically reliable implementa- 
tion of the stepwise regression technique, which 
is widely used to Identify an aerodynamic model 
from flight test data. This capability as well as 
the numerical superiority of this scheme over the 
stepwise regression technique is demonstrated in 
an experimental simulation study. 


1 . Introduction 

This paper describes a study of the so-called 
subset regression problem (SRP). This problem 
occurs in the identification of a mathematical 
model representing the aerodynamic forces and 
moments which act on an aircraft as a function of 
1} aircraft state quantities, such as angle of 
attack and Mach number and 2) aircraft input quan- 
tities, such as control surface deflections. This 
mathematical model is referred to as the air- 
craft’s aerodynamic model. In Refs. 1-3, precise 
conditions have been stated which allow formula- 
tion of this identification problem as an SRP. 

In this paper, we focus on the numerical 
techniques used to solve SRP. Generally, tech- 
niques for solving SRP are divided into two dif- 
ferent classes: 

1) A first class is formulated in a complete 
statistical framework. The techniques in this 
class are generally referred to as subset regres- 
sion methods and are widely used by practicing 
engineers and econometrists. 


is called the Minimal Residual QR (MRQR) algo- 
rithm. Although the MRQR scheme is derived from a 
numerical analysis point of view and therefore 
belongs in the second class mentioned above, it 
will be shown that it combines the advantages of 
techniques from both classes. On the one hand, it 
retains the precision of the original data such as 
the SVD and on the other hand can be used In a 
complete similar way as the stepwise regression 
technique (SRT), since it produces quantities such 
as sequential F- tests, etc. 

The outline of the paper is as follows. In 
Section 2, the new scheme will be described. Its 
relationship with the existing SRT is indicated in 
Section 3 t and Section 4 presents the result of a 
comparison study between the MRQR and the SRT. 
Finally, Section 5 presents the conclusions of 
this research. 


2. me Minimal Hesiduai QR Algorithm 

This new scheme is originally proposed in 
Ref. 7. In this section, we review this scheme to 
reveal the relationship with the SRT. 

The MRQR algorithm performs a QR factoriza- 
tion with column pivoting of the system matrix 
A in the considered SRP. If this SRP is denoted 
as: 

mini Ax - bl^ ( 1 ) 

x 

with I * I ^ representing the Euclidean norm, and 
ft e R mxn (m > n) , b € R m , x e R n 

and 


2) A second class has its basis mainly in 
numerical analysis. Currently the most widely 
used technique from this class Is the Singular 
Value Decomposition (SVD) method. ^ 


The SVD has clearly demonstrated Its numeri- 
cal superiority over the techniques from the first 
class. However, It does not do the subset selec- 
tion from the originally defined model parameters. 
This is a major drawback for the aerodynamic model 
identification problem, since the original model 
parameters have a physical interpretation. 

In order to overcome this drawback, a new 
algorithm has been developed. This new technique 


rank(A) = k s n 


then we can write the result of MRQR as 

mlnlQ T ft*(ii T x) - Q T bl 2 (2) 

X 

where Q Is an orthogonal transformation matrix, 
l.e.» Q^Q = I and w is the column permutation 
matrix. The operation indicated by Eq. (2) 
results in 


min 

x 


[ R 11 

R 12 

0 

r 22 ] 




(3) 
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where is kxk and upper triangular, 

Is kx(n-k) and e R k . From the decomposition 
of Eq. (3), the solution to Eq. (1), which now Is 
no longer a minimal norm soLution, becomes 


1 


*i 1 R ii b i 


(4) 


and the 2-norm of the residual is given by 

The crucial part In this computational scheme 
[Eqs. (2-4)] is the generation of the column per- 
mutation matrix *. The way this is done in the 
MRQR scheme is now illustrated for the simple two- 
dimensional case. 


2. 1 The Two-Dimensional Case 

In this case, the data of problem (1), i.e., 
the two-column vectors of A and the right hand 
side (rhs) b f are graphically represented In 
Fig. 1. 

Just as with the SRT, the MRQR algorithm 
selects one column vector of A at a time. The 
measure initially proposed in Ref. 7 for this 
selection is "the distance of the rhs to the 
column vectors of A." 


1 


In Fig. 1, these distances are denoted by 
m^, where the superscript indicates that we are 
selecting the first column and the subscript indi- 
cates the corresponding column position in the 
A-matrix. The column that will be selected by the 
MRQR algorithm is the one "closest” to the rhs. 
This corresponds to finding the minimum of the 
( Im^l } , which for the case of Fig. 1 

i Li i i 1 


sequence 
results in 


the selection of a 


r 


This column is 


then permuted to the first column position of 
A. In the next step, the same selection procedure 
is repeated for the components of the remaining 
columns of A and |jhe component of the rhs that 
is orthogonal to aj. These components are rep- 


resented in Fig, 1 by aZ and b 


respectively. 


T)je selection in the orthogonal complement of 
a 1 Is obvious now, since Im^l = 0. 

The quantities Imjl 
"residuals" because each 
the least squares problem 


will be referred to as 
is the residual of 


minlaft - b 1 !- 

(Ref. 7). The selection of the minimum of that 
sequence {lm*|} gave ri *e to the algorithmic 
name, Minimal Residual QR algorithm. 

2.2 Generalization of the Two-Dimensional Case 


1) Select column vector of A 1 closest to 
b*,^based on the computed sequence of residuals 
(lm*H. This column is called a*. 

2) Interchange the Jth and ith column 
vector of A* 

«i. 


by the column permutation matrix 


3) Perform an orthogonal projection Qp 
such that 


r B (i-n 

R u 

D (l-1)x(n-l+1) ' 
R 12 

0 

a 1 a 1 a 1 

a i ••• a J ••• a nJ 


[r (1) 

R 11 

-(i)x(n-i) 

R 12 


1+1 1+1 

0 

a i+1 a n 

r R (i > 

_(l)x(n-i) 


n 

12 


0 

A U ' 



( 6 ) 


and 


V 


i+1 


H 


><i> 


where R^' is an ixi upper-triangular matrix 
and Is an i*J rectangular matrix. 


4) Rank determination test. 


END 


The rank determination in the fourth step of 
the loop, i.e., rejecting one of the columns 

of A, can be done in various ways. In this 
paper, this determination will be based on the 
so-called partial F-test quantities, such as 
those used in SRT. 1 * This is outlined in 
section 3. 

2.3 Implementation Note on the MRQR Algorithm 

An efficient way to calculate the residuals 
|m*I and to perform the orthogonal transforma- 
tions Q t in Eq. (6) is described in Ref. 7. 

Here we initially transform problem (1) to: 


The generalization of the two-dimensional 
case, given in pseudo-programming-language form, 
is summarized in the following algorithm. 

Algorithm 1 : 

Define: 

A = A 1 = [a] ... a! ... a 1 ) and b 1 = b 

■ j i 

rank = n 
DO i = 1 :n, 


minlQ^Ax - 0^bl 2 



(7) 


where R is an nxn upper-triangular matrix 
and b 1 e R n+1 . 


This compression of the data can be done 
sequentially, such as with the construction of the 
"normal” equations (see section 3). This sequen- 
tial ability also makes this scheme attractive 
from the data storage point of view. 


2 



Next, Algorithm 1 is applied to the right- 
hand side of Eq. (7). A combination of these 
intermediate steps would then result in the fol- 
lowing reformulation of problem (1). 

minlfQ^ ... Q^JQ^Af Wj ... * k ]([*j ... » k J T x) 

- K ••• Q >;bl 2 (8) 

T T 

where (Q^ ... Q 1 ] will be denoted as Q^ and 
[* ] . . .' * k ) as n. 

Note : The robustness properties of the MRQR 

algorithm rely on the use of orthogonal transfor- 
mations, denoted in Eq. (8) by %. Og* and *, 
respectively. These do not modify the error pat- 
tern present on the original data. 

3. Relationship of the MRQR Algorithm with 
Stepwise Regression 

The classical stepwise regression technique 
(SRT) soLves problem (1) via the so-called normal 
equations. 


T T 

(A A]x = A b 


(9) 


From this set of equations the partial F-test 
quantities, the partial correlation coefficients, 
and other quantities are computed. 14 These 
latter quantities form the crucial information 
source in adding (or subtracting) a new variable 
x^ in the regression model. The partial F-test 
quantities are defined next. 


Definition 1 . When in the construction of 
the regression model, we add parameter x t to the 
model already containing 1-1 parameters, the 
F-test value for that parameter x^ is: 


F 


x 


i 


T T 

^-^l - 1 ~ c ll l (m _ i} 

e i e i 


( 10 ) 


where c. are the residuals of regression 

models containing i-1 and l parameters, 
respectively. 

The quantity defined in Eq. (10) can also be 
derived from the MRQR algorithm. In order to 
clarify this, let us focus at the ith stage of 
the do-loop of algorithm 1. The partial F-test 
value for each parameter x^ not yet In the model 
becomes : 


ib 1 .* 


Imjl* 


|-M 


(m - i) 


( 11 ) 


The difference of the two norms in Eq. (11) 
does not have to be computed explicitly; It 
becomes available during the computation of^ 

we denote this difference by ld t l 2 , 
then Eq . (11) becomes 



(m - i) 


( 12) 


The F-test quantities now allow us to make a 
decision about the statistical significance of 
adding parameter x^ to the model. Statistical 
significance corresponds to the F-test values 
being above a threshold F q , which is taken from 
statistical tables about the F-d istr ibut ion 
(e.g., Ref. 4). From Eq. (12), we clearly see 
that we can impose this test directly on tbe ele- 
ments of the derived residual sequence in 

the MRQR algorithm. Based on the desired thresh- 
old F . such a test would become 
a T 




idp 2 yurr 


i) 




(13) 


This Inequality precisely reveals the rela- 
tionship between the MRQR and SRT. It indicates 
that the same parameter x t would be added to the 
model whether based on the {lm*l 2 } sequence or the 
(F y ] sequence. 

This section summarizes the use of the MRQR 
algorithm as a robust implementation of the clas- 
sical stepwise regression scheme. Furthermore, as 
demonstrated in Ref. 7, the MRQR algorithm gives 
rise to additional parameters, such as smallest 
singular values, that may be helpful in construct- 
ing the regression model. In this paper, such an 
additional feature is presented in section 4.2. 


4 . Experimental Evaluation 

4. 1 Demonstration of the Equivalence between MRQR 
and SRT 

In this first example, the data taken from 
Ref. 9 (p. 647) and analyzed in Ref. 4 via the 
classical SRT are analyzed by the proposed MRQR 
algorithm. These data, summarized in Table 1, 
comprise four candidate solutions (l.e., the 
columns of the A matrix), and a right-hand side 
(the b-vector), from which 13 samples have been 
recorded. Hence, m s 13 and n = 4. In this 
section the candidate solutions will be referred 
to by their corresponding component of the 
x-vector, defined in Eq. (1), The results of the 
MRQR algorithm are summarized in Table 2. Compar- 
ison of the partial F-test values derived from 
MRQR, denoted by Fj, with those derived from SRT, 
denoted by F 2 , clearly demonstrates the equiva- 
lence of both schemes. Furthermore, we also 
observe that using the {|m*l 2 } sequence results in 
the same model as using the partial F-test 
values. 

4.2 An Additional Feature of the MRQR 

In analyzing real data, visual inspection of 
time-history plots of the available data is often 
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used. Such plots of the candidate solutions and 
the rhs might allow (or influence) the decision of 
which parameter to include in the regression model 
or might help to Judge the validity of the 
extracted model. If the same graphical infor- 
mation for all subsequent decision stages Is 
desired, it is generally necessary to store the 
original A-matrix and explicitly compute the 
residuals of the columns not yet in the model. 

In- the 1th decision stage, the matrix A 
is partitioned as 

A s (A 1 ' 1 | A n " l+1 ] ( 14) 


where A*" designates the already-selected 
columns, then the residuals of the columns in 
A n-U1 become [A 1 ” x^ ^ - A n ” * ] . These resid- 
uals can also be computed from the data available 
from the MRQR algorithm without explicitly storing 
the A-matrix. Using Eqs. (6-8) they become 




*i-1 



(15) 


Here only [Qj ... Q^l is stored explicitly, 
whereas the information to construct 0^ can be 
stored in the lower triangular part of (R|0] T in 
Eq. (7).*° In this way, use of Eq. (15) becomes a 
very reliable and storage-efficient way to evalu- 
ate residuals in the regression analysis. 

This feature is demonstrated using the fol- 
lowing example taken from Ref. 11 (p. ID-4.7). 

Here the original plant was generated by the fol- 
lowing static model. 

y(t) = 1.2 sin(t) ♦ 2 sin(.7t) + 3 cos(t) 

- 0. 1 (sin(2t) + «] - 2 sin(.99t) + e (16) 

where 6 and e are zero-mean white noise 
sequences with standard deviations of 0.001 and 
0.1, respectively. 

In the regression analysis, the following 
model was postulated. 

b(t) = x ^ sin(t) + Xg sin(.7t) > x^ cos(t) 

♦ x 4 cos(3t) (17) 

The four candidate solutions (sin(t), 
sin(.7t), cos(t), cos(3t)] and y(t) of Eq. (16) 
are shown in Fig. 2. Clearly, from this figure 
candidate solution x^ appears most closely 
related to the rhs, as was also demonstrated by 
the F-test values or the I * m t * 2 ) residuals * 

After this selection, the remaining parts of the 
other candidate solutions as well as the remaining 
part of the rhs are shown in Fig. 3. 


Next the candidate solution was 

selected. Again, Fig. 4 displays the residuals of 
the remaining columns and the rhs. From this 
figure the "high” correlation with x 1 becomes 
clear. The residuals of the remaining signals, 
i.e., rhs and x^, are displayed in Fig. 4 . 

From this analysis, we clearly observe that 
the residuals of each of the candidate solutions 
after each selection are nearly the same as the 
candidate solutions before the selections. This 
is a clear indication of "orthogonality” of the 
candidate solutions, which is also attributed to 
the value of the condition number of the matrix 
containing the original four candidate solutions. 
The latter is *2, which clearly is close to 1. 

This example demonstrates one feature of this 
"residual” analysis, but practical experience 
might provide additional uses. 

4.3 Numerical Superiority of the MRQR 

In the previous two examples, the condition 
number of the original A-matrix, as defined in 
Eq. (1), was very close to 1. For these cases, 
the F-tests computed from the MRQR algorithm or 
the SRT are completely identical. For cases where 
the condition number is larger, this numerical 
equivalence may be lost, as is demonstrated by the 
following least-squares problem: 




’1 

1 

1 


3 


min 


e 

0 

0 

x - 

€ 


X 


0 

E 

0 


e 




0 

0 

E 


t. 



where t is a very small number. The normal 
equations are 


i ♦ t 2 i i 


3 ♦ e 2 

1 + e 2 1 

X = 

3 * e 2 

1 1 I.e 2 


b ♦ < 2 J 


Q 

When e is taken, for example, equal to 1CT°, 
with a machine precision of 10" ^ , the matrix 
A T A In Eq. (19) becomes singular and clearly 
destroys the accuracy of the F-test values. With 
the use of the MROR algorithm and the same machine 
precision of 10”^ , this does not occur. 


5. Conclusion 

A new technique to solve subset regression 
problems has been presented. The technique is 
called the minimal residual OR algorithm (MRQR). 
Basically, It performs a QR factorization with 
column pivoting. It has been shown analytically 
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that the MRQR algorithm can be used as a numeri- 
cally stable implementation of existing stepwise 
regression techniques. The numerical stability is 
demonstrated in an experimental evaluation study. 

From this technique, a reliable solution of 
subset regression problems has been derived that 
allows the use of statistical parameters commonly 
used in classical solutions, such as the stepwise 
regression technique. This new technique should 
allow more Accurate" models to be constructed, 
especially when a high correlation exists between 
candidate solutions in the model. 
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Table 1 Data taken from Ref. 9 (p. 647) in 
example 1 . 


Xl 

*2 


X4 

RHS 

7.0 

26.0 

6.0 

60.0 

78.5 

i.O 

29.0 

15.0 

52.0 

74.3 

11.0 

56.0 

8.0 

20.0 

104.3 

n.o 

31.0 

8.0 

47.0 

87.6 

7.0 

52.0 

6.0 

33.0 

95.9 

n.o 

55.0 

9.0 

22.0 

109.2 

3.0 

71.0 

17.0 j 

6.0 

102.7 

1.0 

31.0 

22.0 

44.0 

72.5 

2.0 

54.0 

18.0 

22.0 

93.1 

21.0 

47.0 

4.0 

26.0 

115.9 

1.0 

40.0 

23.0 

34.0 

83.8 

11.0 

66.0 

9.0 

12.0 

113.3 

10.0 

68.0 

8.0 

12.0 

109.4 
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Table 2 (Retyped) computer output of MRQR on the 
data of Ref. 9 (p. 647). 


A. SE LECTING THE FIRST COLUMN FOR ENTRY 


SEL. # 

CAND. SOL. 

ism 

F , 

Ft 

1 


35.5764914 

12.6025177 

12.6025150 

2 

*2 

30.1054204 

21.9606047 

21.9606160 

3 

*3 

44.3862470 

4.4034168 

4.4034159 

4 

*4 

29.7298993 

22.7985202 

22.7985270 


FOR E NTRY 


SEL. # 

CAND. SOL. 

IUJ8HI 

Fi 

Ft 

1 

*2 

6.9262346 

5.0258650 

5.0258974 

2 

^3 

7.1299448 

4.2358460 

4.2358519 


WHICH SEL # [ENTER 0 TO QUIT SELECTION]: 1 


WHICH SEL # [ENTER 0 TO QUIT SELECTION]: 3 
DELETING A COLUMN FROM THE FIRST 1 SELECTED 


SEL. # 

CAND. SOL. 

F 1 

F 2 

1 

*4 

22.7985202 

22.7985270 


WHICH SEL # [ENTER 0 TO SKIP]: 0 


DELETING A COLUMN FROM THE FIRST 3 SELECTED 


SEL. # 

CAND. SOL. 

Ft 

Ft 

1 

*4 

1.8632624 

1.8632548 

2 

*1 

154.0076353 

154.0080400 

3 

x 2 

5.0258646 

5.0258974 


WHICH SEL # [ENTER 0 TO SKIP): 0 


D. SELE C TING . TH E. S ECQ.ND .CO LU M _N__F,Q R .ENTRY 




ibhu 

Ft 

IMM1 

1 

*1 

8.6465085 

108.2239145 

108.2238900 

2 

x 2 

29.4767726 

.1724839 

.1724847 

3 

*3 

13.2566210 

40.2945810 

40.2945430 


WHICH SEL # [ENTER 0 TO QUIT SELECTION]: 1 


DELETING A COLUMN FROM THE FIRST 2 SELECTED 


SEL. # 

CAND. SOL. 

a 

Ft 

1 

* 4 

159.2952101 

159.2952400 

2 

*1 

108.2239145 

108.2238900 


WHICH SEL # [ENTER 0 TO SKIP]: 0 


D. SJEXILCXIJNJ^TH * UM N_EQR-E NXR Y 


SEL. # 

CAND. SOL. 

mm 

F\ 

■s 

1 

*3 

6.9183549 




WHICH SEL # [ENTER 0 TO QUIT SELECTION]: O 


SUMMARY 

RANK ESTIMATE: 3 

PIVOT INFORMATION: 

ORIGINAL COLUMN 4 PIVOTED TO COLUMN l 
ORIGINAL COLUMN ! PIVOTED TO COLUMN 2 
ORIGINAL COLUMN 2 PIVOTED TO COLUMN 3 
ORIGINAL COLUMN 3 DELETED 
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Fig. 1 The MROR algorithm for the two-dimensional 
case (m = 2, n = 2). 



01 23456789 10 


TIME 

Fig. 2 Time-history plots of the candidate solu- 
tions and right-hand side before first selection 
in the MRQR. 
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CANDIDATE SOLUTIONS AFTER FIRST SELECTION 


CANDIDATE SOLUTIONS AFTER SECOND SELECTION 




Fig. 3 Time-history plots of the candidate solu- 
tions arid right-hand side after first selection ir 
the MRQR. 


Fig. 'I Time-history plot3 of the candidate solu- 
tions and right-hand side after second selection 
in the MRQR. 


4 r 


CANDIDATE SOLUTION AND RIGHT HAND 
SIDE AFTER THIRD SELECTION 


3 - 
2 - 


RESIDUAL 
X 4 rhs 


1 

0 

-1 




V 


V 


-2 \- 


-3 - 

-4 1 1 j — T — i i j.. i t i » 

012 3 456789 10 

TIME 


Fig. 5 Time-history plots of the candidate solu- 
tions and right-hand side after third selection in 
the MRQR. 
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